Overview

Brought to you by YData

Dataset statistics

 Dataset ADataset B
Number of variables1212
Number of observations446446
Missing cells421428
Missing cells (%)7.9%8.0%
Duplicate rows00
Duplicate rows (%)0.0%0.0%
Total size in memory45.3 KiB45.3 KiB
Average record size in memory104.0 B104.0 B

Variable types

 Dataset ADataset B
Numeric55
Categorical44
Text33

Alerts

Dataset ADataset B
Parch is highly overall correlated with SibSpAlert not present in this datasetHigh Correlation
Sex is highly overall correlated with SurvivedSex is highly overall correlated with SurvivedHigh Correlation
SibSp is highly overall correlated with ParchAlert not present in this datasetHigh Correlation
Survived is highly overall correlated with SexSurvived is highly overall correlated with SexHigh Correlation
Age has 76 (17.0%) missing values Age has 89 (20.0%) missing values Missing
Cabin has 345 (77.4%) missing values Cabin has 338 (75.8%) missing values Missing
PassengerId has unique values PassengerId has unique values Unique
Name has unique values Name has unique values Unique
SibSp has 298 (66.8%) zeros SibSp has 304 (68.2%) zeros Zeros
Parch has 332 (74.4%) zeros Parch has 337 (75.6%) zeros Zeros
Fare has 6 (1.3%) zeros Fare has 8 (1.8%) zeros Zeros

Reproduction

 Dataset ADataset B
Analysis started2024-09-07 10:04:45.8310262024-09-07 10:04:49.001479
Analysis finished2024-09-07 10:04:48.9981102024-09-07 10:04:52.186942
Duration3.17 seconds3.19 seconds
Software versionydata-profiling v0.0.dev0ydata-profiling v0.0.dev0
Download configurationconfig.jsonconfig.json

Variables

PassengerId
Real number (ℝ)

 Dataset ADataset B
Distinct446446
Distinct (%)100.0%100.0%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean446.81166440.6009
 Dataset ADataset B
Minimum31
Maximum891890
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-09-07T10:04:52.458344image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum31
5-th percentile4848.25
Q1216.25227.25
median440.5420.5
Q3680.75660.75
95-th percentile850.75847.75
Maximum891890
Range888889
Interquartile range (IQR)464.5433.5

Descriptive statistics

 Dataset ADataset B
Standard deviation261.04917257.07717
Coefficient of variation (CV)0.584248790.58346947
Kurtosis-1.258845-1.176954
Mean446.81166440.6009
Median Absolute Deviation (MAD)233221
Skewness0.0417025450.074890775
Sum199278196508
Variance68146.6766088.672
MonotonicityNot monotonicNot monotonic
2024-09-07T10:04:52.658937image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
204 1
 
0.2%
299 1
 
0.2%
403 1
 
0.2%
889 1
 
0.2%
833 1
 
0.2%
442 1
 
0.2%
108 1
 
0.2%
672 1
 
0.2%
389 1
 
0.2%
152 1
 
0.2%
Other values (436) 436
97.8%
ValueCountFrequency (%)
843 1
 
0.2%
794 1
 
0.2%
170 1
 
0.2%
325 1
 
0.2%
764 1
 
0.2%
141 1
 
0.2%
444 1
 
0.2%
332 1
 
0.2%
857 1
 
0.2%
568 1
 
0.2%
Other values (436) 436
97.8%
ValueCountFrequency (%)
3 1
0.2%
6 1
0.2%
8 1
0.2%
10 1
0.2%
11 1
0.2%
14 1
0.2%
15 1
0.2%
16 1
0.2%
18 1
0.2%
19 1
0.2%
ValueCountFrequency (%)
1 1
0.2%
3 1
0.2%
6 1
0.2%
8 1
0.2%
9 1
0.2%
11 1
0.2%
12 1
0.2%
14 1
0.2%
15 1
0.2%
16 1
0.2%
ValueCountFrequency (%)
1 1
0.2%
3 1
0.2%
6 1
0.2%
8 1
0.2%
9 1
0.2%
11 1
0.2%
12 1
0.2%
14 1
0.2%
15 1
0.2%
16 1
0.2%
ValueCountFrequency (%)
3 1
0.2%
6 1
0.2%
8 1
0.2%
10 1
0.2%
11 1
0.2%
14 1
0.2%
15 1
0.2%
16 1
0.2%
18 1
0.2%
19 1
0.2%

Survived
Categorical

 Dataset ADataset B
Distinct22
Distinct (%)0.4%0.4%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
0
283 
1
163 
0
272 
1
174 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters22
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st row10
2nd row00
3rd row00
4th row01
5th row00

Common Values

ValueCountFrequency (%)
0 283
63.5%
1 163
36.5%
ValueCountFrequency (%)
0 272
61.0%
1 174
39.0%

Length

2024-09-07T10:04:52.809973image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2024-09-07T10:04:52.920012image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-09-07T10:04:53.022255image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
0 283
63.5%
1 163
36.5%
ValueCountFrequency (%)
0 272
61.0%
1 174
39.0%

Most occurring characters

ValueCountFrequency (%)
0 283
63.5%
1 163
36.5%
ValueCountFrequency (%)
0 272
61.0%
1 174
39.0%

Most occurring categories

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 283
63.5%
1 163
36.5%
ValueCountFrequency (%)
0 272
61.0%
1 174
39.0%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 283
63.5%
1 163
36.5%
ValueCountFrequency (%)
0 272
61.0%
1 174
39.0%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 283
63.5%
1 163
36.5%
ValueCountFrequency (%)
0 272
61.0%
1 174
39.0%

Pclass
Categorical

 Dataset ADataset B
Distinct33
Distinct (%)0.7%0.7%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
3
252 
1
107 
2
87 
3
236 
1
118 
2
92 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters33
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st row11
2nd row33
3rd row33
4th row31
5th row33

Common Values

ValueCountFrequency (%)
3 252
56.5%
1 107
24.0%
2 87
 
19.5%
ValueCountFrequency (%)
3 236
52.9%
1 118
26.5%
2 92
 
20.6%

Length

2024-09-07T10:04:53.135783image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2024-09-07T10:04:53.246451image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-09-07T10:04:53.360696image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
3 252
56.5%
1 107
24.0%
2 87
 
19.5%
ValueCountFrequency (%)
3 236
52.9%
1 118
26.5%
2 92
 
20.6%

Most occurring characters

ValueCountFrequency (%)
3 252
56.5%
1 107
24.0%
2 87
 
19.5%
ValueCountFrequency (%)
3 236
52.9%
1 118
26.5%
2 92
 
20.6%

Most occurring categories

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
3 252
56.5%
1 107
24.0%
2 87
 
19.5%
ValueCountFrequency (%)
3 236
52.9%
1 118
26.5%
2 92
 
20.6%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
3 252
56.5%
1 107
24.0%
2 87
 
19.5%
ValueCountFrequency (%)
3 236
52.9%
1 118
26.5%
2 92
 
20.6%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
3 252
56.5%
1 107
24.0%
2 87
 
19.5%
ValueCountFrequency (%)
3 236
52.9%
1 118
26.5%
2 92
 
20.6%

Name
['Text', 'Text']

 Dataset ADataset B
Distinct446446
Distinct (%)100.0%100.0%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-09-07T10:04:53.832008image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Length

 Dataset ADataset B
Max length6182
Median length4949
Mean length26.68161426.804933
Min length1313

Characters and Unicode

 Dataset ADataset B
Total characters1190011955
Distinct characters5959
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique446446 ?
Unique (%)100.0%100.0%

Sample

 Dataset ADataset B
1st rowSaalfeld, Mr. AdolpheHoyt, Mr. William Fisher
2nd rowJussila, Miss. Mari AinaLing, Mr. Lee
3rd rowJohnston, Miss. Catherine Helen "Carrie"Sage, Mr. George John Jr
4th rowSaad, Mr. AminCarter, Mrs. William Ernest (Lucile Polk)
5th rowHampe, Mr. LeonBoulos, Mrs. Joseph (Sultana)
ValueCountFrequency (%)
mr 270
 
15.0%
miss 87
 
4.8%
mrs 60
 
3.3%
william 25
 
1.4%
john 21
 
1.2%
master 17
 
0.9%
charles 17
 
0.9%
george 13
 
0.7%
henry 12
 
0.7%
james 11
 
0.6%
Other values (908) 1269
70.4%
ValueCountFrequency (%)
mr 253
 
14.0%
miss 99
 
5.5%
mrs 62
 
3.4%
william 35
 
1.9%
john 19
 
1.0%
master 18
 
1.0%
henry 17
 
0.9%
george 16
 
0.9%
james 13
 
0.7%
thomas 12
 
0.7%
Other values (895) 1266
69.9%
2024-09-07T10:04:54.545601image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1357
 
11.4%
r 948
 
8.0%
a 823
 
6.9%
e 810
 
6.8%
n 678
 
5.7%
i 662
 
5.6%
s 650
 
5.5%
M 554
 
4.7%
l 537
 
4.5%
o 529
 
4.4%
Other values (49) 4352
36.6%
ValueCountFrequency (%)
1365
 
11.4%
r 974
 
8.1%
e 884
 
7.4%
a 843
 
7.1%
i 677
 
5.7%
n 661
 
5.5%
s 657
 
5.5%
M 559
 
4.7%
o 490
 
4.1%
l 484
 
4.0%
Other values (49) 4361
36.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 11900
100.0%
ValueCountFrequency (%)
(unknown) 11955
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
1357
 
11.4%
r 948
 
8.0%
a 823
 
6.9%
e 810
 
6.8%
n 678
 
5.7%
i 662
 
5.6%
s 650
 
5.5%
M 554
 
4.7%
l 537
 
4.5%
o 529
 
4.4%
Other values (49) 4352
36.6%
ValueCountFrequency (%)
1365
 
11.4%
r 974
 
8.1%
e 884
 
7.4%
a 843
 
7.1%
i 677
 
5.7%
n 661
 
5.5%
s 657
 
5.5%
M 559
 
4.7%
o 490
 
4.1%
l 484
 
4.0%
Other values (49) 4361
36.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 11900
100.0%
ValueCountFrequency (%)
(unknown) 11955
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
1357
 
11.4%
r 948
 
8.0%
a 823
 
6.9%
e 810
 
6.8%
n 678
 
5.7%
i 662
 
5.6%
s 650
 
5.5%
M 554
 
4.7%
l 537
 
4.5%
o 529
 
4.4%
Other values (49) 4352
36.6%
ValueCountFrequency (%)
1365
 
11.4%
r 974
 
8.1%
e 884
 
7.4%
a 843
 
7.1%
i 677
 
5.7%
n 661
 
5.5%
s 657
 
5.5%
M 559
 
4.7%
o 490
 
4.1%
l 484
 
4.0%
Other values (49) 4361
36.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 11900
100.0%
ValueCountFrequency (%)
(unknown) 11955
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
1357
 
11.4%
r 948
 
8.0%
a 823
 
6.9%
e 810
 
6.8%
n 678
 
5.7%
i 662
 
5.6%
s 650
 
5.5%
M 554
 
4.7%
l 537
 
4.5%
o 529
 
4.4%
Other values (49) 4352
36.6%
ValueCountFrequency (%)
1365
 
11.4%
r 974
 
8.1%
e 884
 
7.4%
a 843
 
7.1%
i 677
 
5.7%
n 661
 
5.5%
s 657
 
5.5%
M 559
 
4.7%
o 490
 
4.1%
l 484
 
4.0%
Other values (49) 4361
36.5%

Sex
Categorical

 Dataset ADataset B
Distinct22
Distinct (%)0.4%0.4%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
male
296 
female
150 
male
282 
female
164 

Length

 Dataset ADataset B
Max length66
Median length44
Mean length4.67264574.735426
Min length44

Characters and Unicode

 Dataset ADataset B
Total characters20842112
Distinct characters55
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st rowmalemale
2nd rowfemalemale
3rd rowfemalemale
4th rowmalefemale
5th rowmalefemale

Common Values

ValueCountFrequency (%)
male 296
66.4%
female 150
33.6%
ValueCountFrequency (%)
male 282
63.2%
female 164
36.8%

Length

2024-09-07T10:04:54.712557image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2024-09-07T10:04:54.834591image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-09-07T10:04:54.935944image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
male 296
66.4%
female 150
33.6%
ValueCountFrequency (%)
male 282
63.2%
female 164
36.8%

Most occurring characters

ValueCountFrequency (%)
e 596
28.6%
m 446
21.4%
a 446
21.4%
l 446
21.4%
f 150
 
7.2%
ValueCountFrequency (%)
e 610
28.9%
m 446
21.1%
a 446
21.1%
l 446
21.1%
f 164
 
7.8%

Most occurring categories

ValueCountFrequency (%)
(unknown) 2084
100.0%
ValueCountFrequency (%)
(unknown) 2112
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 596
28.6%
m 446
21.4%
a 446
21.4%
l 446
21.4%
f 150
 
7.2%
ValueCountFrequency (%)
e 610
28.9%
m 446
21.1%
a 446
21.1%
l 446
21.1%
f 164
 
7.8%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 2084
100.0%
ValueCountFrequency (%)
(unknown) 2112
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 596
28.6%
m 446
21.4%
a 446
21.4%
l 446
21.4%
f 150
 
7.2%
ValueCountFrequency (%)
e 610
28.9%
m 446
21.1%
a 446
21.1%
l 446
21.1%
f 164
 
7.8%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 2084
100.0%
ValueCountFrequency (%)
(unknown) 2112
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 596
28.6%
m 446
21.4%
a 446
21.4%
l 446
21.4%
f 150
 
7.2%
ValueCountFrequency (%)
e 610
28.9%
m 446
21.1%
a 446
21.1%
l 446
21.1%
f 164
 
7.8%

Age
Real number (ℝ)

 Dataset ADataset B
Distinct7469
Distinct (%)20.0%19.3%
Missing7689
Missing (%)17.0%20.0%
Infinite00
Infinite (%)0.0%0.0%
Mean29.62364929.714734
 Dataset ADataset B
Minimum0.750.75
Maximum7470
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-09-07T10:04:55.093164image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum0.750.75
5-th percentile63
Q12021
median28.528
Q337.7538
95-th percentile56.5556.4
Maximum7470
Range73.2569.25
Interquartile range (IQR)17.7517

Descriptive statistics

 Dataset ADataset B
Standard deviation14.1505814.511798
Coefficient of variation (CV)0.477678490.48837045
Kurtosis0.290494-0.11158716
Mean29.62364929.714734
Median Absolute Deviation (MAD)8.58
Skewness0.473030770.24958798
Sum10960.7510608.16
Variance200.2389210.59228
MonotonicityNot monotonicNot monotonic
2024-09-07T10:04:55.297105image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
22 16
 
3.6%
18 16
 
3.6%
19 16
 
3.6%
25 14
 
3.1%
21 14
 
3.1%
24 13
 
2.9%
30 13
 
2.9%
29 13
 
2.9%
32 12
 
2.7%
33 12
 
2.7%
Other values (64) 231
51.8%
(Missing) 76
 
17.0%
ValueCountFrequency (%)
22 19
 
4.3%
28 16
 
3.6%
25 14
 
3.1%
36 13
 
2.9%
21 12
 
2.7%
18 12
 
2.7%
27 11
 
2.5%
29 10
 
2.2%
32 10
 
2.2%
34 10
 
2.2%
Other values (59) 230
51.6%
(Missing) 89
 
20.0%
ValueCountFrequency (%)
0.75 1
 
0.2%
1 3
0.7%
2 5
1.1%
3 4
0.9%
4 5
1.1%
6 3
0.7%
7 2
 
0.4%
8 2
 
0.4%
9 3
0.7%
10 1
 
0.2%
ValueCountFrequency (%)
0.75 2
 
0.4%
0.83 2
 
0.4%
1 5
1.1%
2 6
1.3%
3 4
0.9%
4 5
1.1%
5 1
 
0.2%
6 2
 
0.4%
8 2
 
0.4%
9 3
0.7%
ValueCountFrequency (%)
0.75 2
 
0.4%
0.83 2
 
0.4%
1 5
1.1%
2 6
1.3%
3 4
0.9%
4 5
1.1%
5 1
 
0.2%
6 2
 
0.4%
8 2
 
0.4%
9 3
0.7%
ValueCountFrequency (%)
0.75 1
 
0.2%
1 3
0.7%
2 5
1.1%
3 4
0.9%
4 5
1.1%
6 3
0.7%
7 2
 
0.4%
8 2
 
0.4%
9 3
0.7%
10 1
 
0.2%

SibSp
Real number (ℝ)

 Dataset ADataset B
Distinct77
Distinct (%)1.6%1.6%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean0.54708520.54484305
 Dataset ADataset B
Minimum00
Maximum88
Zeros298304
Zeros (%)66.8%68.2%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-09-07T10:04:55.448625image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile00
Q100
median00
Q311
95-th percentile33
Maximum88
Range88
Interquartile range (IQR)11

Descriptive statistics

 Dataset ADataset B
Standard deviation1.09186081.1866326
Coefficient of variation (CV)1.99577842.1779347
Kurtosis15.7956417.948964
Mean0.54708520.54484305
Median Absolute Deviation (MAD)00
Skewness3.4063583.7998006
Sum244243
Variance1.192161.4080969
MonotonicityNot monotonicNot monotonic
2024-09-07T10:04:55.573468image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0 298
66.8%
1 107
 
24.0%
2 14
 
3.1%
3 12
 
2.7%
4 11
 
2.5%
8 3
 
0.7%
5 1
 
0.2%
ValueCountFrequency (%)
0 304
68.2%
1 105
 
23.5%
2 13
 
2.9%
4 9
 
2.0%
3 7
 
1.6%
8 5
 
1.1%
5 3
 
0.7%
ValueCountFrequency (%)
0 298
66.8%
1 107
 
24.0%
2 14
 
3.1%
3 12
 
2.7%
4 11
 
2.5%
5 1
 
0.2%
8 3
 
0.7%
ValueCountFrequency (%)
0 304
68.2%
1 105
 
23.5%
2 13
 
2.9%
3 7
 
1.6%
4 9
 
2.0%
5 3
 
0.7%
8 5
 
1.1%
ValueCountFrequency (%)
0 304
68.2%
1 105
 
23.5%
2 13
 
2.9%
3 7
 
1.6%
4 9
 
2.0%
5 3
 
0.7%
8 5
 
1.1%
ValueCountFrequency (%)
0 298
66.8%
1 107
 
24.0%
2 14
 
3.1%
3 12
 
2.7%
4 11
 
2.5%
5 1
 
0.2%
8 3
 
0.7%

Parch
Real number (ℝ)

 Dataset ADataset B
Distinct66
Distinct (%)1.3%1.3%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean0.412556050.38340807
 Dataset ADataset B
Minimum00
Maximum55
Zeros332337
Zeros (%)74.4%75.6%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-09-07T10:04:55.693735image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile00
Q100
median00
Q310
95-th percentile22
Maximum55
Range55
Interquartile range (IQR)10

Descriptive statistics

 Dataset ADataset B
Standard deviation0.853399920.78674014
Coefficient of variation (CV)2.06856722.0519655
Kurtosis8.92819578.4967198
Mean0.412556050.38340807
Median Absolute Deviation (MAD)00
Skewness2.71509072.5729338
Sum184171
Variance0.728291430.61896004
MonotonicityNot monotonicNot monotonic
2024-09-07T10:04:55.814507image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
0 332
74.4%
1 66
 
14.8%
2 38
 
8.5%
5 4
 
0.9%
4 4
 
0.9%
3 2
 
0.4%
ValueCountFrequency (%)
0 337
75.6%
1 60
 
13.5%
2 43
 
9.6%
5 3
 
0.7%
3 2
 
0.4%
4 1
 
0.2%
ValueCountFrequency (%)
0 332
74.4%
1 66
 
14.8%
2 38
 
8.5%
3 2
 
0.4%
4 4
 
0.9%
5 4
 
0.9%
ValueCountFrequency (%)
0 337
75.6%
1 60
 
13.5%
2 43
 
9.6%
3 2
 
0.4%
4 1
 
0.2%
5 3
 
0.7%
ValueCountFrequency (%)
0 337
75.6%
1 60
 
13.5%
2 43
 
9.6%
3 2
 
0.4%
4 1
 
0.2%
5 3
 
0.7%
ValueCountFrequency (%)
0 332
74.4%
1 66
 
14.8%
2 38
 
8.5%
3 2
 
0.4%
4 4
 
0.9%
5 4
 
0.9%

Ticket
['Text', 'Text']

 Dataset ADataset B
Distinct375373
Distinct (%)84.1%83.6%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-09-07T10:04:56.520718image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Length

 Dataset ADataset B
Max length1818
Median length1717
Mean length6.85874446.6950673
Min length34

Characters and Unicode

 Dataset ADataset B
Total characters30592986
Distinct characters3135
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique324323 ?
Unique (%)72.6%72.4%

Sample

 Dataset ADataset B
1st row19988PC 17600
2nd row41371601
3rd rowW./C. 6607CA. 2343
4th row2671113760
5th row3457692678
ValueCountFrequency (%)
pc 31
 
5.5%
c.a 13
 
2.3%
a/5 8
 
1.4%
ston/o 8
 
1.4%
2 8
 
1.4%
sc/paris 6
 
1.1%
347082 6
 
1.1%
soton/oq 5
 
0.9%
3101295 5
 
0.9%
347088 5
 
0.9%
Other values (392) 473
83.3%
ValueCountFrequency (%)
pc 35
 
6.2%
c.a 9
 
1.6%
ca 8
 
1.4%
347082 5
 
0.9%
2343 5
 
0.9%
ston/o2 5
 
0.9%
ston/o 5
 
0.9%
2 5
 
0.9%
sc/paris 4
 
0.7%
a/5 4
 
0.7%
Other values (393) 477
84.9%
2024-09-07T10:04:57.262351image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3 373
12.2%
1 347
11.3%
2 299
9.8%
7 265
8.7%
4 224
 
7.3%
0 213
 
7.0%
6 206
 
6.7%
5 178
 
5.8%
9 176
 
5.8%
8 150
 
4.9%
Other values (21) 628
20.5%
ValueCountFrequency (%)
3 376
12.6%
1 364
12.2%
2 301
10.1%
7 252
8.4%
4 231
7.7%
6 207
 
6.9%
0 199
 
6.7%
5 179
 
6.0%
9 162
 
5.4%
8 138
 
4.6%
Other values (25) 577
19.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 3059
100.0%
ValueCountFrequency (%)
(unknown) 2986
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
3 373
12.2%
1 347
11.3%
2 299
9.8%
7 265
8.7%
4 224
 
7.3%
0 213
 
7.0%
6 206
 
6.7%
5 178
 
5.8%
9 176
 
5.8%
8 150
 
4.9%
Other values (21) 628
20.5%
ValueCountFrequency (%)
3 376
12.6%
1 364
12.2%
2 301
10.1%
7 252
8.4%
4 231
7.7%
6 207
 
6.9%
0 199
 
6.7%
5 179
 
6.0%
9 162
 
5.4%
8 138
 
4.6%
Other values (25) 577
19.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 3059
100.0%
ValueCountFrequency (%)
(unknown) 2986
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
3 373
12.2%
1 347
11.3%
2 299
9.8%
7 265
8.7%
4 224
 
7.3%
0 213
 
7.0%
6 206
 
6.7%
5 178
 
5.8%
9 176
 
5.8%
8 150
 
4.9%
Other values (21) 628
20.5%
ValueCountFrequency (%)
3 376
12.6%
1 364
12.2%
2 301
10.1%
7 252
8.4%
4 231
7.7%
6 207
 
6.9%
0 199
 
6.7%
5 179
 
6.0%
9 162
 
5.4%
8 138
 
4.6%
Other values (25) 577
19.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 3059
100.0%
ValueCountFrequency (%)
(unknown) 2986
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
3 373
12.2%
1 347
11.3%
2 299
9.8%
7 265
8.7%
4 224
 
7.3%
0 213
 
7.0%
6 206
 
6.7%
5 178
 
5.8%
9 176
 
5.8%
8 150
 
4.9%
Other values (21) 628
20.5%
ValueCountFrequency (%)
3 376
12.6%
1 364
12.2%
2 301
10.1%
7 252
8.4%
4 231
7.7%
6 207
 
6.9%
0 199
 
6.7%
5 179
 
6.0%
9 162
 
5.4%
8 138
 
4.6%
Other values (25) 577
19.3%

Fare
Real number (ℝ)

 Dataset ADataset B
Distinct167182
Distinct (%)37.4%40.8%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean33.49589834.731211
 Dataset ADataset B
Minimum00
Maximum512.3292512.3292
Zeros68
Zeros (%)1.3%1.8%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-09-07T10:04:57.459019image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile7.2257.225
Q17.9257.925
median14.515.5
Q330.9239532.596875
95-th percentile118.31875133.65
Maximum512.3292512.3292
Range512.3292512.3292
Interquartile range (IQR)22.9989524.671875

Descriptive statistics

 Dataset ADataset B
Standard deviation54.31472952.845168
Coefficient of variation (CV)1.62153371.521547
Kurtosis30.62774231.953129
Mean33.49589834.731211
Median Absolute Deviation (MAD)7.258.21875
Skewness4.72129374.6474074
Sum14939.17115490.12
Variance2950.08982792.6118
MonotonicityNot monotonicNot monotonic
2024-09-07T10:04:57.666625image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
8.05 28
 
6.3%
13 18
 
4.0%
7.8958 17
 
3.8%
7.75 16
 
3.6%
7.775 13
 
2.9%
26 13
 
2.9%
7.2292 10
 
2.2%
7.925 10
 
2.2%
10.5 10
 
2.2%
26.55 9
 
2.0%
Other values (157) 302
67.7%
ValueCountFrequency (%)
13 23
 
5.2%
8.05 23
 
5.2%
7.8958 21
 
4.7%
26 18
 
4.0%
7.75 12
 
2.7%
7.925 10
 
2.2%
7.2292 9
 
2.0%
7.775 9
 
2.0%
26.55 8
 
1.8%
0 8
 
1.8%
Other values (172) 305
68.4%
ValueCountFrequency (%)
0 6
1.3%
4.0125 1
 
0.2%
5 1
 
0.2%
6.4958 1
 
0.2%
6.95 1
 
0.2%
6.975 1
 
0.2%
7.05 4
0.9%
7.0542 1
 
0.2%
7.125 3
0.7%
7.1417 1
 
0.2%
ValueCountFrequency (%)
0 8
1.8%
4.0125 1
 
0.2%
6.2375 1
 
0.2%
6.4375 1
 
0.2%
6.45 1
 
0.2%
6.4958 1
 
0.2%
6.8583 1
 
0.2%
6.95 1
 
0.2%
7.0458 1
 
0.2%
7.05 2
 
0.4%
ValueCountFrequency (%)
0 8
1.8%
4.0125 1
 
0.2%
6.2375 1
 
0.2%
6.4375 1
 
0.2%
6.45 1
 
0.2%
6.4958 1
 
0.2%
6.8583 1
 
0.2%
6.95 1
 
0.2%
7.0458 1
 
0.2%
7.05 2
 
0.4%
ValueCountFrequency (%)
0 6
1.3%
4.0125 1
 
0.2%
5 1
 
0.2%
6.4958 1
 
0.2%
6.95 1
 
0.2%
6.975 1
 
0.2%
7.05 4
0.9%
7.0542 1
 
0.2%
7.125 3
0.7%
7.1417 1
 
0.2%

Cabin
['Text', 'Text']

 Dataset ADataset B
Distinct8592
Distinct (%)84.2%85.2%
Missing345338
Missing (%)77.4%75.8%
Memory size7.0 KiB7.0 KiB
2024-09-07T10:04:58.167571image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Length

 Dataset ADataset B
Max length1515
Median length33
Mean length3.93069313.5833333
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters397387
Distinct characters1919
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique7078 ?
Unique (%)69.3%72.2%

Sample

 Dataset ADataset B
1st rowC106B96 B98
2nd rowB71C124
3rd rowC2C32
4th rowB37B4
5th rowE8B39
ValueCountFrequency (%)
f 4
 
3.1%
c23 3
 
2.3%
c25 3
 
2.3%
c27 3
 
2.3%
b57 2
 
1.6%
c52 2
 
1.6%
b63 2
 
1.6%
b66 2
 
1.6%
c126 2
 
1.6%
b59 2
 
1.6%
Other values (88) 103
80.5%
ValueCountFrequency (%)
b96 3
 
2.4%
b98 3
 
2.4%
f2 3
 
2.4%
f 3
 
2.4%
d17 2
 
1.6%
c124 2
 
1.6%
d33 2
 
1.6%
g6 2
 
1.6%
b22 2
 
1.6%
g73 2
 
1.6%
Other values (95) 102
81.0%
2024-09-07T10:04:58.806345image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
C 39
9.8%
B 39
9.8%
2 36
 
9.1%
1 34
 
8.6%
5 30
 
7.6%
6 29
 
7.3%
3 28
 
7.1%
27
 
6.8%
9 20
 
5.0%
7 18
 
4.5%
Other values (9) 97
24.4%
ValueCountFrequency (%)
2 39
 
10.1%
C 39
 
10.1%
1 33
 
8.5%
B 32
 
8.3%
6 29
 
7.5%
3 28
 
7.2%
5 22
 
5.7%
D 22
 
5.7%
8 21
 
5.4%
4 19
 
4.9%
Other values (9) 103
26.6%

Most occurring categories

ValueCountFrequency (%)
(unknown) 397
100.0%
ValueCountFrequency (%)
(unknown) 387
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
C 39
9.8%
B 39
9.8%
2 36
 
9.1%
1 34
 
8.6%
5 30
 
7.6%
6 29
 
7.3%
3 28
 
7.1%
27
 
6.8%
9 20
 
5.0%
7 18
 
4.5%
Other values (9) 97
24.4%
ValueCountFrequency (%)
2 39
 
10.1%
C 39
 
10.1%
1 33
 
8.5%
B 32
 
8.3%
6 29
 
7.5%
3 28
 
7.2%
5 22
 
5.7%
D 22
 
5.7%
8 21
 
5.4%
4 19
 
4.9%
Other values (9) 103
26.6%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 397
100.0%
ValueCountFrequency (%)
(unknown) 387
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
C 39
9.8%
B 39
9.8%
2 36
 
9.1%
1 34
 
8.6%
5 30
 
7.6%
6 29
 
7.3%
3 28
 
7.1%
27
 
6.8%
9 20
 
5.0%
7 18
 
4.5%
Other values (9) 97
24.4%
ValueCountFrequency (%)
2 39
 
10.1%
C 39
 
10.1%
1 33
 
8.5%
B 32
 
8.3%
6 29
 
7.5%
3 28
 
7.2%
5 22
 
5.7%
D 22
 
5.7%
8 21
 
5.4%
4 19
 
4.9%
Other values (9) 103
26.6%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 397
100.0%
ValueCountFrequency (%)
(unknown) 387
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
C 39
9.8%
B 39
9.8%
2 36
 
9.1%
1 34
 
8.6%
5 30
 
7.6%
6 29
 
7.3%
3 28
 
7.1%
27
 
6.8%
9 20
 
5.0%
7 18
 
4.5%
Other values (9) 97
24.4%
ValueCountFrequency (%)
2 39
 
10.1%
C 39
 
10.1%
1 33
 
8.5%
B 32
 
8.3%
6 29
 
7.5%
3 28
 
7.2%
5 22
 
5.7%
D 22
 
5.7%
8 21
 
5.4%
4 19
 
4.9%
Other values (9) 103
26.6%

Embarked
Categorical

 Dataset ADataset B
Distinct33
Distinct (%)0.7%0.7%
Missing01
Missing (%)0.0%0.2%
Memory size7.0 KiB7.0 KiB
S
322 
C
89 
Q
35 
S
309 
C
99 
Q
37 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446445
Distinct characters33
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st rowSC
2nd rowSS
3rd rowSS
4th rowCS
5th rowSC

Common Values

ValueCountFrequency (%)
S 322
72.2%
C 89
 
20.0%
Q 35
 
7.8%
ValueCountFrequency (%)
S 309
69.3%
C 99
 
22.2%
Q 37
 
8.3%
(Missing) 1
 
0.2%

Length

2024-09-07T10:04:58.964235image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2024-09-07T10:04:59.076300image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-09-07T10:04:59.186749image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
ValueCountFrequency (%)
s 322
72.2%
c 89
 
20.0%
q 35
 
7.8%
ValueCountFrequency (%)
s 309
69.4%
c 99
 
22.2%
q 37
 
8.3%

Most occurring characters

ValueCountFrequency (%)
S 322
72.2%
C 89
 
20.0%
Q 35
 
7.8%
ValueCountFrequency (%)
S 309
69.4%
C 99
 
22.2%
Q 37
 
8.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 445
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
S 322
72.2%
C 89
 
20.0%
Q 35
 
7.8%
ValueCountFrequency (%)
S 309
69.4%
C 99
 
22.2%
Q 37
 
8.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 445
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
S 322
72.2%
C 89
 
20.0%
Q 35
 
7.8%
ValueCountFrequency (%)
S 309
69.4%
C 99
 
22.2%
Q 37
 
8.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 445
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
S 322
72.2%
C 89
 
20.0%
Q 35
 
7.8%
ValueCountFrequency (%)
S 309
69.4%
C 99
 
22.2%
Q 37
 
8.3%

Interactions

Dataset A

2024-09-07T10:04:48.108909image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-09-07T10:04:51.296899image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset A

2024-09-07T10:04:46.100204image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-09-07T10:04:49.263384image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset A

2024-09-07T10:04:46.568609image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-09-07T10:04:49.731095image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset A

2024-09-07T10:04:47.150216image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-09-07T10:04:50.332569image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset A

2024-09-07T10:04:47.642848image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-09-07T10:04:50.827497image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset A

2024-09-07T10:04:48.198665image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-09-07T10:04:51.383244image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset A

2024-09-07T10:04:46.188030image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-09-07T10:04:49.348955image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset A

2024-09-07T10:04:46.662196image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-09-07T10:04:49.820993image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset A

2024-09-07T10:04:47.242998image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-09-07T10:04:50.425473image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset A

2024-09-07T10:04:47.729548image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-09-07T10:04:50.916477image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset A

2024-09-07T10:04:48.295828image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-09-07T10:04:51.481902image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset A

2024-09-07T10:04:46.286124image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-09-07T10:04:49.444847image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset A

2024-09-07T10:04:46.763179image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-09-07T10:04:49.921747image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset A

2024-09-07T10:04:47.340951image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-09-07T10:04:50.524076image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset A

2024-09-07T10:04:47.828096image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-09-07T10:04:51.013321image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset A

2024-09-07T10:04:48.399671image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-09-07T10:04:51.583475image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset A

2024-09-07T10:04:46.387557image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-09-07T10:04:49.547655image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset A

2024-09-07T10:04:46.858366image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-09-07T10:04:50.014944image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset A

2024-09-07T10:04:47.448918image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-09-07T10:04:50.634104image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset A

2024-09-07T10:04:47.928700image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-09-07T10:04:51.117221image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset A

2024-09-07T10:04:48.491586image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-09-07T10:04:51.678954image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset A

2024-09-07T10:04:46.478754image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-09-07T10:04:49.638247image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset A

2024-09-07T10:04:46.952164image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-09-07T10:04:50.240616image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset A

2024-09-07T10:04:47.544581image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-09-07T10:04:50.730479image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset A

2024-09-07T10:04:48.018512image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-09-07T10:04:51.206020image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Correlations

Dataset A

2024-09-07T10:04:59.276041image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset B

2024-09-07T10:04:59.413459image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/

Dataset A

AgeEmbarkedFareParchPassengerIdPclassSexSibSpSurvived
Age1.0000.1150.117-0.2100.0860.2710.129-0.2350.122
Embarked0.1151.0000.1990.0000.0000.2380.0000.0840.129
Fare0.1170.1991.0000.442-0.0240.4670.2100.4730.275
Parch-0.2100.0000.4421.0000.0380.0760.3050.5100.165
PassengerId0.0860.000-0.0240.0381.0000.0620.063-0.0450.000
Pclass0.2710.2380.4670.0760.0621.0000.1580.1460.333
Sex0.1290.0000.2100.3050.0630.1581.0000.2740.537
SibSp-0.2350.0840.4730.510-0.0450.1460.2741.0000.211
Survived0.1220.1290.2750.1650.0000.3330.5370.2111.000

Dataset B

AgeEmbarkedFareParchPassengerIdPclassSexSibSpSurvived
Age1.0000.0060.136-0.2470.0050.2600.000-0.2020.158
Embarked0.0061.0000.2110.0250.0000.2910.1390.0790.215
Fare0.1360.2111.0000.410-0.0080.4560.2380.4150.303
Parch-0.2470.0250.4101.000-0.0150.0000.1980.4360.089
PassengerId0.0050.000-0.008-0.0151.0000.0000.000-0.0850.082
Pclass0.2600.2910.4560.0000.0001.0000.1410.1530.373
Sex0.0000.1390.2380.1980.0000.1411.0000.1860.576
SibSp-0.2020.0790.4150.436-0.0850.1530.1861.0000.207
Survived0.1580.2150.3030.0890.0820.3730.5760.2071.000

Missing values

Dataset A

2024-09-07T10:04:48.627750image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
A simple visualization of nullity by column.

Dataset B

2024-09-07T10:04:51.813359image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
A simple visualization of nullity by column.

Dataset A

2024-09-07T10:04:48.819198image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Dataset B

2024-09-07T10:04:52.005018image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Dataset A

2024-09-07T10:04:48.945258image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Dataset B

2024-09-07T10:04:52.128201image/svg+xmlMatplotlib v3.9.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
29829911Saalfeld, Mr. AdolphemaleNaN001998830.5000C106S
40240303Jussila, Miss. Mari Ainafemale21.01041379.8250NaNS
88888903Johnston, Miss. Catherine Helen "Carrie"femaleNaN12W./C. 660723.4500NaNS
83283303Saad, Mr. AminmaleNaN0026717.2292NaNC
44144203Hampe, Mr. Leonmale20.0003457699.5000NaNS
10710813Moss, Mr. Albert JohanmaleNaN003129917.7750NaNS
67167201Davidson, Mr. Thorntonmale31.010F.C. 1275052.0000B71S
38838903Sadlier, Mr. MatthewmaleNaN003676557.7292NaNQ
15115211Pears, Mrs. Thomas (Edith Wearne)female22.01011377666.6000C2S
42842903Flynn, Mr. JamesmaleNaN003648517.7500NaNQ

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
79379401Hoyt, Mr. William FishermaleNaN00PC 1760030.6958NaNC
16917003Ling, Mr. Leemale28.0000160156.4958NaNS
32432503Sage, Mr. George John JrmaleNaN82CA. 234369.5500NaNS
76376411Carter, Mrs. William Ernest (Lucile Polk)female36.0012113760120.0000B96 B98S
14014103Boulos, Mrs. Joseph (Sultana)femaleNaN02267815.2458NaNC
44344412Reynaldo, Ms. Encarnacionfemale28.000023043413.0000NaNS
33133201Partner, Mr. Austenmale45.500011304328.5000C124S
85685711Wick, Mrs. George Dennick (Mary Hitchcock)female45.001136928164.8667NaNS
56756803Palsson, Mrs. Nils (Alma Cornelia Berglund)female29.000434990921.0750NaNS
83183212Richards, Master. George Sibleymale0.83112910618.7500NaNS

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
959603Shorney, Mr. Charles JosephmaleNaN003749108.0500NaNS
72072112Harper, Miss. Annie Jessie "Nina"female6.00124872733.0000NaNS
30630711Fleming, Miss. MargaretfemaleNaN0017421110.8833NaNC
86586612Bystrom, Mrs. (Karolina)female42.00023685213.0000NaNS
666712Nye, Mrs. (Elizabeth Ramell)female29.000C.A. 2939510.5000F33S
40840903Birkeland, Mr. Hans Martin Monsenmale21.0003129927.7750NaNS
44344412Reynaldo, Ms. Encarnacionfemale28.00023043413.0000NaNS
10010103Petranec, Miss. Matildafemale28.0003492457.8958NaNS
18919003Turcin, Mr. Stjepanmale36.0003492477.8958NaNS
20320403Youseff, Mr. Geriousmale45.50026287.2250NaNC

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
43043111Bjornstrom-Steffansson, Mr. Mauritz Hakanmale28.00011056426.5500C52S
18518601Rood, Mr. Hugh RoscoemaleNaN0011376750.0000A32S
50550601Penasco y Castellana, Mr. Victor de Satodemale18.010PC 17758108.9000C65C
37837903Betros, Mr. Tannousmale20.00026484.0125NaNC
68368403Goodwin, Mr. Charles Edwardmale14.052CA 214446.9000NaNS
33934001Blackwell, Mr. Stephen Weartmale45.00011378435.5000TS
12212302Nasser, Mr. Nicholasmale32.51023773630.0708NaNC
81681703Heininen, Miss. Wendla Mariafemale23.000STON/O2. 31012907.9250NaNS
242503Palsson, Miss. Torborg Danirafemale8.03134990921.0750NaNS
84284311Serepeca, Miss. Augustafemale30.00011379831.0000NaNC

Duplicate rows

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked# duplicates
Dataset does not contain duplicate rows.

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked# duplicates
Dataset does not contain duplicate rows.